A Literature Survey on Domain Adaptation of Statistical Classifiers
نویسنده
چکیده
The domain adaptation problem, especially domain adaptation in natural language processing, started gaining much attention very recently [Daumé III and Marcu, 2006, Blitzer et al., 2006, Ben-David et al., 2007, Daumé III, 2007, Satpal and Sarawagi, 2007]. However, some special kinds of domain adaptation problems have been studied before under different names such as class imbalance [Japkowicz and Stephen, 2002], covariate shift [Shimodaira, 2000], and sample selection bias [Heckman, 1979]. There are also some well-studied machine learning problems that are closely related but not equivalent to domain adaptation, including multi-task learning [Caruana, 1997] and semi-supervised learning [Chapelle et al., 2006]. In this literature survey, we review existing work in both the machine learning and the natural language processing communities related to domain adaptation. Because this relatively new topic is constantly attracting attention, our survey is necessarily incomplete. Nevertheless, we try to cover the major lines of work that we are aware of up to the date this survey is written. This survey will also be constantly updated. The goal of this literature survey is twofold. First, existing studies on domain adaptation seem very different from each other, and different terms are used to refer to the problem. There has not been any survey that connects these different studies. This survey thus tries to organize the existing work in a systematic way and draw a big picture of the domain adaptation problem with its possible solutions. Second, a systematic literature survey shows the limitations of current work and points out promising directions that should be explored.
منابع مشابه
Active Learning for Cross-domain Sentiment Classification
In the literature, various approaches have been proposed to address the domain adaptation problem in sentiment classification (also called cross-domain sentiment classification). However, the adaptation performance normally much suffers when the data distributions in the source and target domains differ significantly. In this paper, we suggest to perform active learning for cross-domain sentime...
متن کاملDomain Adaptation for Statistical Classifiers
The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the “in-domain” test data is drawn from a distribution that is related, but not identical, to the “out-of-domain” distribution of the training data. We consider the common case in which labeled out-of-domain data ...
متن کاملBagging-based System Combination for Domain Adaptation
Domain adaptation plays an important role in multi-domain SMT. Conventional approaches usually resort to statistical classifiers, but they require annotated monolingual data in different domains, which may not be available in some cases. We instead propose a simple but effective bagging-based approach without using any annotated data. Large-scale experiments show that our new method improves tr...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملBootstrapping polarity classifiers with rule-based classification
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rulebased classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007